Multi-objective infinite-horizon discounted Markov decision processes

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

We consider infinite-horizon γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies π1, . . . , πk it implicitely generates until some iteration k. We provide performance bounds for non-stationary policies involving the last m generated policies that reduce the state-of-t...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Average Optimality in Nonhomogeneous Infinite Horizon Markov Decision Processes

We consider a nonhomogeneous stochastic infinite horizon optimization problem whose objective is to minimize the overall average cost per-period of an infinite sequence of actions (average optimality). Optimal solutions to such problems will in general be non-stationary. Moreover, a solution which initially makes poor decisions, and then selects wisely thereafter, can be average optimal. Howeve...

متن کامل

Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown, Smith, and Sun (2010). This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these constraints. In this paper, we study infinite horizon DPs with discounted costs a...

متن کامل

Multi-objective discounted dynamic programming The Neighbour Search approach to construct Pareto sets of multi-objective Markov Decision Processes

The Neighbour Search (NS) algorithm, is an iterative method for constructing Pareto sets of multi-dimensional polytopes. A NS iteration consists in two steps: Edges Exploration and Neighbour Detection. Edges Exploration takes a Pareto vertex and determines all Pareto edges connecting such a Pareto vertex to its neighbours. Each neighbour is again a Pareto vertex that is obtained by Neighbour De...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Mathematical Analysis and Applications

سال: 1982

ISSN: 0022-247X

DOI: 10.1016/0022-247x(82)90122-6